home *** CD-ROM | disk | FTP | other *** search
- NAME
- perlref - Perl references and nested data structures
-
- DESCRIPTION
- Before release 5 of Perl it was difficult to represent
- complex data structures, because all references had to be
- symbolic, and even that was difficult to do when you
- wanted to refer to a variable rather than a symbol table
- entry. Perl 5 not only makes it easier to use symbolic
- references to variables, but lets you have "hard"
- references to any piece of data. Any scalar may hold a
- hard reference. Since arrays and hashes contain scalars,
- you can now easily build arrays of arrays, arrays of
- hashes, hashes of arrays, arrays of hashes of functions,
- and so on.
-
- Hard references are smart--they keep track of reference
- counts for you, automatically freeing the thing referred
- to when its reference count goes to zero. If that thing
- happens to be an object, the object is destructed. See
- the perlobj manpage for more about objects. (In a sense,
- everything in Perl is an object, but we usually reserve
- the word for references to objects that have been
- officially "blessed" into a class package.)
-
- A symbolic reference contains the name of a variable, just
- as a symbolic link in the filesystem merely contains the
- name of a file. The *glob notation is a kind of symbolic
- reference. Hard references are more like hard links in
- the file system: merely another way at getting at the same
- underlying object, irrespective of its name.
-
- "Hard" references are easy to use in Perl. There is just
- one overriding principle: Perl does no implicit
- referencing or dereferencing. When a scalar is holding a
- reference, it always behaves as a scalar. It doesn't
- magically start being an array or a hash unless you tell
- it so explicitly by dereferencing it.
-
- References can be constructed several ways.
-
- 1. By using the backslash operator on a variable,
- subroutine, or value. (This works much like the &
- (address-of) operator works in C.) Note that this
- typically creates ANOTHER reference to a variable,
- since there's already a reference to the variable in
- the symbol table. But the symbol table reference
- might go away, and you'll still have the reference
- that the backslash returned. Here are some examples:
-
-
-
- $scalarref = \$foo;
- $arrayref = \@ARGV;
- $hashref = \%ENV;
- $coderef = \&handler;
- $globref = \*STDOUT;
-
- 2. A reference to an anonymous array can be constructed
- using square brackets:
-
- $arrayref = [1, 2, ['a', 'b', 'c']];
-
- Here we've constructed a reference to an anonymous
- array of three elements whose final element is itself
- reference to another anonymous array of three
- elements. (The multidimensional syntax described
- later can be used to access this. For example, after
- the above, $arrayref->[2][1] would have the value
- "b".)
-
- Note that taking a reference to an enumerated list is
- not the same as using square brackets--instead it's
- the same as creating a list of references!
-
- @list = (\$a, \$b, \$c);
- @list = \($a, $b, $c); # same thing!
-
- 3. A reference to an anonymous hash can be constructed
- using curly brackets:
-
- $hashref = {
- 'Adam' => 'Eve',
- 'Clyde' => 'Bonnie',
- };
-
- Anonymous hash and array constructors can be
- intermixed freely to produce as complicated a
- structure as you want. The multidimensional syntax
- described below works for these too. The values above
- are literals, but variables and expressions would work
- just as well, because assignment operators in Perl
- (even within local() or my()) are executable
- statements, not compile-time declarations.
-
- Because curly brackets (braces) are used for several
- other things including BLOCKs, you may occasionally
- have to disambiguate braces at the beginning of a
- statement by putting a + or a return in front so that
- Perl realizes the opening brace isn't starting a
- BLOCK. The economy and mnemonic value of using
- curlies is deemed worth this occasional extra hassle.
-
- For example, if you wanted a function to make a new
- hash and return a reference to it, you have these
- options:
-
- sub hashem { { @_ } } # silently wrong
- sub hashem { +{ @_ } } # ok
- sub hashem { return { @_ } } # ok
-
- 4. A reference to an anonymous subroutine can be
- constructed by using sub without a subname:
-
- $coderef = sub { print "Boink!\n" };
-
- Note the presence of the semicolon. Except for the
- fact that the code inside isn't executed immediately,
- a sub {} is not so much a declaration as it is an
- operator, like do{} or eval{}. (However, no matter
- how many times you execute that line (unless you're in
- an eval("...")), $coderef will still have a reference
- to the SAME anonymous subroutine.)
-
- Anonymous subroutines act as closures with respect to
- my() variables, that is, variables visible lexically
- within the current scope. Closure is a notion out of
- the Lisp world that says if you define an anonymous
- function in a particular lexical context, it pretends
- to run in that context even when it's called outside
- of the context.
-
- In human terms, it's a funny way of passing arguments
- to a subroutine when you define it as well as when you
- call it. It's useful for setting up little bits of
- code to run later, such as callbacks. You can even do
- object-oriented stuff with it, though Perl provides a
- different mechanism to do that already--see the
- perlobj manpage.
-
- You can also think of closure as a way to write a
- subroutine template without using eval. (In fact, in
- version 5.000, eval was the only way to get closures.
- You may wish to use "require 5.001" if you use
- closures.)
-
- Here's a small example of how closures works:
-
- sub newprint {
- my $x = shift;
- return sub { my $y = shift; print "$x, $y!\n"; };
- }
- $h = newprint("Howdy");
- $g = newprint("Greetings");
-
- # Time passes...
-
- &$h("world");
- &$g("earthlings");
-
- This prints
-
- Howdy, world!
- Greetings, earthlings!
-
- Note particularly that $x continues to refer to the
- value passed into newprint() despite the fact that the
- "my $x" has seemingly gone out of scope by the time
- the anonymous subroutine runs. That's what closure is
- all about.
-
- This only applies to lexical variables, by the way.
- Dynamic variables continue to work as they have always
- worked. Closure is not something that most Perl
- programmers need trouble themselves about to begin
- with.
-
- 5. References are often returned by special subroutines
- called constructors. Perl objects are just references
- to a special kind of object that happens to know which
- package it's associated with. Constructors are just
- special subroutines that know how to create that
- association. They do so by starting with an ordinary
- reference, and it remains an ordinary reference even
- while it's also being an object. Constructors are
- customarily named new(), but don't have to be:
-
- $objref = new Doggie (Tail => 'short', Ears => 'long');
-
- 6. References of the appropriate type can spring into
- existence if you dereference them in a context that
- assumes they exist. Since we haven't talked about
- dereferencing yet, we can't show you any examples yet.
-
- 7. References to filehandles can be created by taking a
- reference to a typeglob. This is currently the best
- way to pass filehandles into or out of subroutines, or
- to store them in larger data structures.
-
- splutter(\*STDOUT);
- sub splutter {
- my $fh = shift;
- print $fh "her um well a hmmm\n";
- }
-
- $rec = get_rec(\*STDIN);
- sub get_rec {
- my $fh = shift;
- return scalar <$fh>;
- }
- That's it for creating references. By now you're probably
- dying to know how to use references to get back to your
- long-lost data. There are several basic methods.
-
- 1. Anywhere you'd put an identifier as part of a variable
- or subroutine name, you can replace the identifier
- with a simple scalar variable containing a reference
- of the correct type:
-
- $bar = $$scalarref;
- push(@$arrayref, $filename);
- $$arrayref[0] = "January";
- $$hashref{"KEY"} = "VALUE";
- &$coderef(1,2,3);
- print $globref "output\n";
-
- It's important to understand that we are specifically
- NOT dereferencing $arrayref[0] or $hashref{"KEY"}
- there. The dereference of the scalar variable happens
- BEFORE it does any key lookups. Anything more
- complicated than a simple scalar variable must use
- methods 2 or 3 below. However, a "simple scalar"
- includes an identifier that itself uses method 1
- recursively. Therefore, the following prints "howdy".
-
- $refrefref = \\\"howdy";
- print $$$$refrefref;
-
- 2. Anywhere you'd put an identifier as part of a variable
- or subroutine name, you can replace the identifier
- with a BLOCK returning a reference of the correct
- type. In other words, the previous examples could be
- written like this:
-
- $bar = ${$scalarref};
- push(@{$arrayref}, $filename);
- ${$arrayref}[0] = "January";
- ${$hashref}{"KEY"} = "VALUE";
- &{$coderef}(1,2,3);
- $globref->print("output\n"); # iff you use FileHandle
-
- Admittedly, it's a little silly to use the curlies in
- this case, but the BLOCK can contain any arbitrary
- expression, in particular, subscripted expressions:
-
- &{ $dispatch{$index} }(1,2,3); # call correct routine
-
- Because of being able to omit the curlies for the
- simple case of $$x, people often make the mistake of
- viewing the dereferencing symbols as proper operators,
- and wonder about their precedence. If they were,
- though, you could use parens instead of braces.
- That's not the case. Consider the difference below;
- case 0 is a short-hand version of case 1, NOT case 2:
-
- $$hashref{"KEY"} = "VALUE"; # CASE 0
- ${$hashref}{"KEY"} = "VALUE"; # CASE 1
- ${$hashref{"KEY"}} = "VALUE"; # CASE 2
- ${$hashref->{"KEY"}} = "VALUE"; # CASE 3
-
- Case 2 is also deceptive in that you're accessing a
- variable called %hashref, not dereferencing through
- $hashref to the hash it's presumably referencing.
- That would be case 3.
-
- 3. The case of individual array elements arises often
- enough that it gets cumbersome to use method 2. As a
- form of syntactic sugar, the two lines like that above
- can be written:
-
- $arrayref->[0] = "January";
- $hashref->{"KEY"} = "VALUE";
-
- The left side of the array can be any expression
- returning a reference, including a previous
- dereference. Note that $array[$x] is NOT the same
- thing as $array->[$x] here:
-
- $array[$x]->{"foo"}->[0] = "January";
-
- This is one of the cases we mentioned earlier in which
- references could spring into existence when in an
- lvalue context. Before this statement, $array[$x] may
- have been undefined. If so, it's automatically
- defined with a hash reference so that we can look up
- {"foo"} in it. Likewise $array[$x]->{"foo"} will
- automatically get defined with an array reference so
- that we can look up [0] in it.
-
- One more thing here. The arrow is optional BETWEEN
- brackets subscripts, so you can shrink the above down
- to
-
- $array[$x]{"foo"}[0] = "January";
-
- Which, in the degenerate case of using only ordinary
- arrays, gives you multidimensional arrays just like
- C's:
-
- $score[$x][$y][$z] += 42;
-
- Well, okay, not entirely like C's arrays, actually. C
- doesn't know how to grow its arrays on demand. Perl
- does.
-
- 4. If a reference happens to be a reference to an object,
- then there are probably methods to access the things
- referred to, and you should probably stick to those
- methods unless you're in the class package that
- defines the object's methods. In other words, be
- nice, and don't violate the object's encapsulation
- without a very good reason. Perl does not enforce
- encapsulation. We are not totalitarians here. We do
- expect some basic civility though.
-
- The ref() operator may be used to determine what type of
- thing the reference is pointing to. See the perlfunc
- manpage.
-
- The bless() operator may be used to associate a reference
- with a package functioning as an object class. See the
- perlobj manpage.
-
- A typeglob may be dereferenced the same way a reference
- can, since the dereference syntax always indicates the
- kind of reference desired. So ${*foo} and ${\$foo} both
- indicate the same scalar variable.
-
- Here's a trick for interpolating a subroutine call into a
- string:
-
- print "My sub returned @{[mysub(1,2,3)]} that time.\n";
-
- The way it works is that when the @{...} is seen in the
- double-quoted string, it's evaluated as a block. The
- block creates a reference to an anonymous array containing
- the results of the call to mysub(1,2,3). So the whole
- block returns a reference to an array, which is then
- dereferenced by @{...} and stuck into the double-quoted
- string. This chicanery is also useful for arbitrary
- expressions:
-
- print "That yeilds @{[$n + 5]} widgets\n";
-
- Symbolic references
-
- We said that references spring into existence as necessary
- if they are undefined, but we didn't say what happens if a
- value used as a reference is already defined, but ISN'T a
- hard reference. If you use it as a reference in this
- case, it'll be treated as a symbolic reference. That is,
- the value of the scalar is taken to be the NAME of a
- variable, rather than a direct link to a (possibly)
- anonymous value.
-
- People frequently expect it to work like this. So it
- does.
-
-
- $name = "foo";
- $$name = 1; # Sets $foo
- ${$name} = 2; # Sets $foo
- ${$name x 2} = 3; # Sets $foofoo
- $name->[0] = 4; # Sets $foo[0]
- @$name = (); # Clears @foo
- &$name(); # Calls &foo() (as in Perl 4)
- $pack = "THAT";
- ${"${pack}::$name"} = 5; # Sets $THAT::foo without eval
-
- This is very powerful, and slightly dangerous, in that
- it's possible to intend (with the utmost sincerity) to use
- a hard reference, and accidentally use a symbolic
- reference instead. To protect against that, you can say
-
- use strict 'refs';
-
- and then only hard references will be allowed for the rest
- of the enclosing block. An inner block may countermand
- that with
-
- no strict 'refs';
-
- Only package variables are visible to symbolic references.
- Lexical variables (declared with my()) aren't in a symbol
- table, and thus are invisible to this mechanism. For
- example:
-
- local($value) = 10;
- $ref = \$value;
- {
- my $value = 20;
- print $$ref;
- }
-
- This will still print 10, not 20. Remember that local()
- affects package variables, which are all "global" to the
- package.
-
- Not-so-symbolic references
-
- A new feature contributing to readability in 5.001 is that
- the brackets around a symbolic reference behave more like
- quotes, just as they always have within a string. That
- is,
-
- $push = "pop on ";
- print "${push}over";
-
- has always meant to print "pop on over", despite the fact
- that push is a reserved word. This has been generalized
- to work the same outside of quotes, so that
-
- print ${push} . "over";
- and even
-
- print ${ push } . "over";
-
- will have the same effect. (This would have been a syntax
- error in 5.000, though Perl 4 allowed it in the spaceless
- form.) Note that this construct is not considered to be a
- symbolic reference when you're using strict refs:
-
- use strict 'refs';
- ${ bareword }; # Okay, means $bareword.
- ${ "bareword" }; # Error, symbolic reference.
-
- Similarly, because of all the subscripting that is done
- using single words, we've applied the same rule to any
- bareword that is used for subscripting a hash. So now,
- instead of writing
-
- $array{ "aaa" }{ "bbb" }{ "ccc" }
-
- you can just write
-
- $array{ aaa }{ bbb }{ ccc }
-
- and not worry about whether the subscripts are reserved
- words. In the rare event that you do wish to do something
- like
-
- $array{ shift }
-
- you can force interpretation as a reserved word by adding
- anything that makes it more than a bareword:
-
- $array{ shift() }
- $array{ +shift }
- $array{ shift @_ }
-
- The -w switch will warn you if it interprets a reserved
- word as a string. But it will no longer warn you about
- using lowercase words, since the string is effectively
- quoted.
-
- WARNING
- You may not (usefully) use a reference as the key to a
- hash. It will be converted into a string:
-
- $x{ \$a } = $a;
-
- If you try to dereference the key, it won't do a hard
- dereference, and you won't accomplish what you're
- attemping. You might want to do something more like
-
- $r = \@a;
- $x{ $r } = $r;
- And then at least you can use the values(), which will be
- real refs, instead of the keys(), which won't.
-
- SEE ALSO
- Besides the obvious documents, source code can be
- instructive. Some rather pathological examples of the use
- of references can be found in the t/op/ref.t regression
- test in the Perl source directory.
-
- See also the perldsc manpage and the perllol manpage for
- how to use references to create complex data structures,
- and the perlobj manpage for how to use them to create
- objects.
-